Published as a conference paper at ICLR 2020
Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David
Silver, and Koray Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. arXiv
preprint arXiv:1611.05397, 2016.
Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali
Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. Population based training
of neural networks. arXiv preprint arXiv:1711.09846, 2017.
Steven Kapturowski, Georg Ostrovski, Will Dabney, John Quan, and Remi Munos. Recurrent
experience replay in distributed reinforcement learning. In International Conference on Learning
Representations, 2019.
Hyoungseok Kim, Jaekyeom Kim, Yeonwoo Jeong, Sergey Levine, and Hyun Oh Song. Emi:
Exploration with mutual information maximizing state and action embeddings. arXiv preprint
arXiv:1810.01176, 2018.
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for one-shot
image recognition. In ICML deep learning workshop, volume 2, 2015.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare,
Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control
through deep reinforcement learning. Nature, 518(7540):529, 2015.
Rémi Munos, Tom Stepleton, Anna Harutyunyan, and Marc Bellemare. Safe and efficient off-policy
reinforcement learning. In Advances in Neural Information Processing Systems, pp. 1046–1054,
2
016.
Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh. Action-conditional
video prediction using deep networks in atari games. In Advances in neural information processing
systems, pp. 2863–2871, 2015.
Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. Deep exploration via
bootstrapped dqn. In Advances In Neural Information Processing Systems, pp. 4026–4034, 2016.
Georg Ostrovski, Marc G Bellemare, Aäron van den Oord, and Rémi Munos. Count-based exploration
with neural density models. In Proceedings of the 34th International Conference on Machine
Learning-Volume 70, pp. 2721–2730. JMLR. org, 2017.
Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration
by self-supervised prediction. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pp. 16–17, 2017.
Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden,
Gabriel Barth-Maron, Hado Van Hasselt, John Quan, Mel Ve cˇ erík, et al. Observe and look further:
Achieving consistent performance on atari. arXiv preprint arXiv:1805.11593, 2018.
Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adrià Puigdomènech, Oriol Vinyals, Demis
Hassabis, Daan Wierstra, and Charles Blundell. Neural episodic control. ICML, 2017.
Nikolay Savinov, Anton Raichuk, Raphaël Marinier, Damien Vincent, Marc Pollefeys, Timothy Lilli-
crap, and Sylvain Gelly. Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274,
2
018.
Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. Universal value function approximators.
In International Conference on Machine Learning, pp. 1312–1320, 2015a.
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. CoRR,
abs/1511.05952, 2015b.
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche,
Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering
the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
12
升级至LINER Premium,并撰写无限数量的评论。